This script will not run without the actual ADNI data downloaded from ADNI. Please run “DA5030.Proj.Bryant.Simulated_ADNI.Rmd” to demonstrate code functionality. This has been submitted to show the exact code I used to generate my GitHub pages website using the actual ADNI data.
This script and corresponding output use real data downloaded from the Alzheimer’s Disease Neuroimaging Initiative. Part of the data agreement terms for ADNI users is that data not be published externally, so I cannot directly share the data upon which I ran my own analysis. However, I have simulated the data using the identical data structure to that downloaded from ADNI and published these simulated datasets as CSVs at [my GitHub repo]((https://github.com/anniegbryant/DA5030_Final_Project). Any reader who wishes to access the actual dataset used for my analysis should register for an ADNI account (free) and refer to the specific data files described in Data Understanding.
Additionally, this project has been published as a GitHub pages website containing figures and analysis created with the actual ADNI dataset, and can be found here: https://anniegbryant.github.io/DA5030_Final_Project/
Lastly, the Shiny app deployed based on this project can be found here: https://annie-bryant.shinyapps.io/TauPET_Shiny_App_Notebook/
For my final project for DA5030 “Data Mining and Machine Learning”, my objective is to leverage neuroimaging-based data to predict cognitive decline in subjects along the cognitive spectrum from cognitively unimpaired to severe dementia. The goal is to identify specific brain regions that, when burdened by Alzheimer’s Disease-related pathology, confer predictive power onto cognitive status, measured via neuropsychological assessment. Ideally, I would like to identify the regions of interest (ROIs) in the brain that change the most with decreasing cognitive ability and to refine a set of ROIs that collectively predict changes to cognitive assessment scores. This will be (tentatively) regarded as a success if one or more ROIs can explain more than 50% variance in cognitive assessment scores (i.e. R\(^2\) > 0.5).
I will focus on one specific form of neuroimaging: Positron Emission Tomography (PET). PET imaging enables the visualization of specific molecular substrates in the brain through the use of radioactively-labeled tracers that bind the target substrate. In this case, I have chosen to focus on PET that binds to the protein tau, which exhibits characteristic misfolding in Alzheimer’s Disease (AD). Misfolded tau not only loses its normal function, but it also aggregates into intracellular neurofibrillary tangles (NFTs) that can disrupt neuronal signaling and promote neurodegeneration. This phenomenon typically follows an archetypical spreading pattern beginning in the entorhinal cortex, progressing out to the hippocampus and amygdala, and then spreading out beyond the medial temporal lobe to the limbic system and onto the neocortex. This staging pattern is well-defined following the seminal paper published by Braak & Braak in 1991; the stages of tau NFT pathology progression are now known as the Braak stages. There are six stages of tau NFT progression in total.
Such staging has traditionally only been possible at autopsy, as it requires careful immunohistochemical staining of several brain regions by an experienced neuropathologist. However, recent years have seen the development of tau-PET tracers that are specific to misfolded NFT tau. One tracer in particular, 18F-AV-1451, has become widely-used in the last few years as a non-invasive biomarker to measure regional accumulation of tau in the human brain. Tau-PET uptake correlates well with the typical postmortem Braak staging patterns (Schwarz et al. 2016) as well as cognitive status (Zhao et al. 2019). Recent studies have utilized machine learning algorithms with tau-PET neuroimaging, as well as other (relatively) non-invasive biomarkers including amyloid-beta PET and cerebrospinal fluid (CSF) protein measurements, to collectively predict onset of dementia (Mishra et al. 2017) or to predict the spread of tau NFT pathology in the brain (Vogel et al. 2019, 2020). However, longitudinal analysis of tau-PET accumulation and its relationship to cognition remains relatively unexplored as of yet, largely owing to the recentness of tau-PET tracer development.
Through my role as a research assistant at the MassGeneral Institute for Neurogenerative Disease, I have worked with the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data repository previously. ADNI is a tremendous resource for imaging-based and molecular biomarker data acquired from thousands of research participants across the country (see Acknowledgments for more information). In 2016, ADNI incorporated 18F-AV-1451 tau-PET neuroimaging into its imaging protocol, and has since amassed well over a thousand tau-PET scans since then. Researchers at UCSF have processed many of these images and quantified regional uptake of the tau-PET tracer, and have generously shared their regional tau-PET data for ADNI collaborators to access. ADNI has also compiled cognitive assessment scores for each subject. I will utilize these two resources to develop individual regression models as well as an ensemble model to predict cognitive decline as a function of pathological tau NFT accumulation throughout the brain.
The only constraint is that I cannot directly share the full dataset as downloaded from ADNI, though I encourage anyone interested in gaining access to register for free at http://adni.loni.usc.edu/. Instead, I wrote a custom encryption function to simulate fake subject identifier IDs, exam dates, tau-PET uptake values, region of interest volumes, age, sex, and CDR-Sum of Boxes scores based on the existing data distribution. These simulated datasets are hosted in my GitHub repo. Please note that the actual ADNI data are used in this file.
My goal in this analysis is to develop a model that can predict change in cognitive status through some combination (linear or nonlinear) of multiple brain regions, each of which exhibit a different change in tau-PET uptake. In doing this, I also hope to identify which region(s) of the brain are most prone to accumulation of tau NFT pathology as measured via PET, and in turn, which region(s) can best predict cognitive decline.
The target feature in this project will be a continuous measurement representing a score on a cognitive assessment score (CDR Sum of Boxes – see Data Understanding). Therefore, models will be evaluated based on their root mean squared error (RMSE) and the R\(^2\) between predicted versus real cognitive scores. I have set a benchmark of success at R\(^2\) > 0.5, meaning the model explains at least 50% of variance seen in cognitive score changes. This is an ambitious threshold, as cognitive status is multifactorial and certainly modulated by more than regional tau accumulation, but this figure will distinguish stronger versus weaker predictive models.
# General data wrangling
library(tidyverse)
library(knitr)
library(kableExtra)
library(DT)
library(lubridate)
library(readxl)
library(fakeR)
# Modeling
library(factoextra)
library(FactoMineR)
library(glmnet)
library(caret)
library(ranger)
library(caretEnsemble)
library(Hmisc)
# Visualization
library(plotly)
library(forcats)
library(ggsignif)
library(ggcorrplot)
library(psych)
library(GGally)
library(gridExtra)
library(colorRamps)
library(RColorBrewer)
library(colorspace)
library(NeuralNetTools)
library(ggplotify)
library(igraph)
# ggseg is used to visualize the brain
# remotes::install_github("LCBC-UiO/ggseg")
# If that doesn't work:
# download.file("https://github.com/LCBC-UiO/ggseg/archive/master.zip", "ggseg.zip")
# unzip("ggseg.zip")
# devtools::install_local("ggseg-master")
library(ggseg)
# remotes::install_github("LCBC-UiO/ggseg3d")
library(ggseg3d)
# remotes::install_github("LCBC-UiO/ggsegExtra")
library(ggsegExtra)
The longitudinal tau-PET dataset was downloaded as a CSV from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Study Data repository located at Study Data/Imaging/PET Image Analysis/UC Berkeley - AV1451 Analysis [ADNI2,3] (version: 5/12/2020). This CSV file contains 1,121 rows and 165 columns. Note:ADNI data is freely accessible to all registered users. Please see my Acknowledgments page for more information about ADNI and its contributors.
On my end, I load partial volume corrected regional tau-PET data, as downloaded from ADNI:
tau.df <- read.csv("../ADNI_Data/Raw_Data/UCBERKELEYAV1451_PVC_05_12_20.csv")
tau.df$EXAMDATE = as.Date(tau.df$EXAMDATE, format="%m/%d/%Y")
# update stamp is irrelevant, drop it
tau.df <- select(tau.df, -update_stamp)
However, since I can’t share the tau-PET data directly from ADNI, I’ve “simulated” this dataset using a custom encryption function (sourced locally from encrypt_df.R, not on GitHub) that modifies the columns as follows:
This way, the data form remains approximately similar to that which I downloaded from ADNI, without sharing any traceable information to one specific individual.
# NOT RUN; just here to show how I simulated the other dataset
source("encrypt_df.R")
tau.df <- encrypt_pet(tau.df)
tau.df$RID <- round(tau.df$RID)
write.csv(tau.df, "Simulated_ADNI_TauPET.csv", row.names = F)
Each row in the CSV represents one tau-PET scan (see str call below). Some subjects had repeated scans separated by approximately one year, while other subjects had only one scan. Columns include subject information including anonymized subject ID, visit code, and PET exam date. The other columns encode regional volume and tau-PET uptake. Specifically, there are 80 distinct cortical and subcortical regions of interest (ROIs), each of which has a volume field (in mm^3) and a tau-PET uptake field, called the Standardized Uptake Value Ratio (SUVR).
str(tau.df)
## 'data.frame': 1120 obs. of 164 variables:
## $ RID : int 21 31 31 56 56 56 59 69 69 69 ...
## $ VISCODE : chr "init" "init" "y1" "init" ...
## $ VISCODE2 : chr "m144" "m144" "m156" "m144" ...
## $ EXAMDATE : Date, format: "2018-02-02" "2018-04-24" ...
## $ INFERIOR_CEREBGM_SUVR : num 1.32 1.33 1.33 1.28 1.24 ...
## $ INFERIOR_CEREBGM_VOLUME : int 52306 54296 54296 56750 56750 56750 59836 56862 56862 56862 ...
## $ HEMIWM_SUVR : num 1.02 0.85 0.866 1.138 1.196 ...
## $ HEMIWM_VOLUME : int 321220 281690 281690 336495 336495 336495 294422 463900 463900 463900 ...
## $ BRAAK12_SUVR : num 2.06 2.24 2.3 1.91 1.88 ...
## $ BRAAK12_VOLUME : int 10275 7587 7587 9376 9376 9376 10379 10981 10981 10981 ...
## $ BRAAK34_SUVR : num 1.95 1.87 1.8 1.82 1.77 ...
## $ BRAAK34_VOLUME : int 95661 95419 95419 92482 92482 92482 94092 112788 112788 112788 ...
## $ BRAAK56_SUVR : num 1.99 1.92 1.84 1.87 1.84 ...
## $ BRAAK56_VOLUME : int 284821 288136 288136 283119 283119 283119 283727 325054 325054 325054 ...
## $ BRAIN_STEM_SUVR : num 1.27 1.12 1.12 1.2 1.17 ...
## $ BRAIN_STEM_VOLUME : int 16955 16952 16952 20508 20508 20492 18057 18872 18872 18866 ...
## $ LEFT_MIDDLEFR_SUVR : num 2.02 1.93 1.8 1.83 1.78 ...
## $ LEFT_MIDDLEFR_VOLUME : int 17640 18517 18517 17164 17164 17164 17683 21907 21907 21907 ...
## $ LEFT_ORBITOFR_SUVR : num 2.17 2.03 1.92 2.11 1.98 ...
## $ LEFT_ORBITOFR_VOLUME : int 11676 10091 10091 11721 11721 11721 10917 12109 12109 12109 ...
## $ LEFT_PARSFR_SUVR : num 2.02 2.01 1.98 2.03 1.99 ...
## $ LEFT_PARSFR_VOLUME : int 9201 7799 7799 9185 9185 9185 7709 9813 9813 9813 ...
## $ LEFT_ACCUMBENS_AREA_SUVR : num 1.14 1.04 1.79 1.12 1.18 ...
## $ LEFT_ACCUMBENS_AREA_VOLUME : int 500 318 318 308 308 308 353 361 361 361 ...
## $ LEFT_AMYGDALA_SUVR : num 1.31 1.54 1.63 1.42 1.37 ...
## $ LEFT_AMYGDALA_VOLUME : int 1367 1224 1224 1561 1561 1561 993 1499 1499 1499 ...
## $ LEFT_CAUDATE_SUVR : num 2.08 1.46 1.34 1.95 1.83 ...
## $ LEFT_CAUDATE_VOLUME : int 3016 4890 4890 3083 3083 3083 2874 4049 4049 4049 ...
## $ LEFT_HIPPOCAMPUS_SUVR : num 2.12 1.96 2.2 1.69 1.73 ...
## $ LEFT_HIPPOCAMPUS_VOLUME : int 3822 3050 3050 3476 3476 3476 3603 3550 3550 3550 ...
## $ LEFT_PALLIDUM_SUVR : num 3.79 1.89 1.95 2.5 2.6 ...
## $ LEFT_PALLIDUM_VOLUME : int 444 2066 2066 1301 1301 1301 1081 1634 1634 1634 ...
## $ LEFT_PUTAMEN_SUVR : num 1.69 1.64 1.42 1.9 1.78 ...
## $ LEFT_PUTAMEN_VOLUME : int 4000 5675 5675 4832 4832 4832 3563 4891 4891 4891 ...
## $ LEFT_THALAMUS_PROPER_SUVR : num 1.45 1.32 1.24 1.54 1.53 ...
## $ LEFT_THALAMUS_PROPER_VOLUME : int 8226 6195 6195 7114 7114 7114 7561 7518 7518 7518 ...
## $ RIGHT_MIDDLEFR_SUVR : num 2.08 1.91 1.8 1.94 1.85 ...
## $ RIGHT_MIDDLEFR_VOLUME : int 17250 18440 18440 15605 15605 15605 16280 22586 22586 22586 ...
## $ RIGHT_ORBITOFR_SUVR : num 2.19 2.01 1.86 2.17 2.03 ...
## $ RIGHT_ORBITOFR_VOLUME : int 11614 12637 12637 11064 11064 11064 11537 12575 12575 12575 ...
## $ RIGHT_PARSFR_SUVR : num 2.17 2.08 1.9 2.09 2.01 ...
## $ RIGHT_PARSFR_VOLUME : int 9255 8131 8131 9641 9641 9641 8839 9119 9119 9119 ...
## $ RIGHT_ACCUMBENS_AREA_SUVR : num 1.41 1.65 1.66 1.01 1.07 ...
## $ RIGHT_ACCUMBENS_AREA_VOLUME : int 545 413 413 423 423 423 542 528 528 528 ...
## $ RIGHT_AMYGDALA_SUVR : num 1.18 1.79 1.89 1.37 1.44 ...
## $ RIGHT_AMYGDALA_VOLUME : int 1268 1028 1028 1464 1464 1464 1313 1797 1797 1797 ...
## $ RIGHT_CAUDATE_SUVR : num 2.01 1.57 1.37 1.96 1.89 ...
## $ RIGHT_CAUDATE_VOLUME : int 3179 4854 4854 2984 2984 2984 3224 3835 3835 3835 ...
## $ RIGHT_HIPPOCAMPUS_SUVR : num 2.01 2.09 2.03 1.62 1.64 ...
## $ RIGHT_HIPPOCAMPUS_VOLUME : int 3978 2723 2723 3489 3489 3489 3667 3942 3942 3942 ...
## $ RIGHT_PALLIDUM_SUVR : num 3.01 2.32 2.12 2.33 2.48 ...
## $ RIGHT_PALLIDUM_VOLUME : int 846 1531 1531 1262 1262 1262 1088 1552 1552 1552 ...
## $ RIGHT_PUTAMEN_SUVR : num 1.68 1.62 1.53 2.06 1.94 ...
## $ RIGHT_PUTAMEN_VOLUME : int 4322 5774 5774 4328 4328 4328 3190 4569 4569 4569 ...
## $ RIGHT_THALAMUS_PROPER_SUVR : num 1.42 1.33 1.24 1.52 1.55 ...
## $ RIGHT_THALAMUS_PROPER_VOLUME : int 5968 5442 5442 5940 5940 5940 6257 7899 7899 7899 ...
## $ CHOROID_SUVR : num 7.45 4.56 4.31 3.84 3.79 ...
## $ CHOROID_VOLUME : int 4180 3591 3591 3165 3165 3165 3717 3663 3663 3663 ...
## $ CTX_LH_BANKSSTS_SUVR : num 1.75 1.49 1.6 1.7 1.63 ...
## $ CTX_LH_BANKSSTS_VOLUME : int 1553 1633 1633 1812 1812 1812 1694 2601 2601 2601 ...
## $ CTX_LH_CAUDALANTERIORCINGULATE_SUVR : num 1.67 1.73 1.65 1.69 1.69 ...
## $ CTX_LH_CAUDALANTERIORCINGULATE_VOLUME : int 1138 1387 1387 1124 1124 1124 1465 1512 1512 1512 ...
## $ CTX_LH_CUNEUS_SUVR : num 2.33 2.2 2.05 2.01 2 ...
## $ CTX_LH_CUNEUS_VOLUME : int 2023 2702 2702 2429 2429 2429 2393 2222 2222 2222 ...
## $ CTX_LH_ENTORHINAL_SUVR : num 2.07 2.3 2.43 2.79 2.52 ...
## $ CTX_LH_ENTORHINAL_VOLUME : int 1468 1035 1035 1068 1068 1068 1297 1888 1888 1888 ...
## $ CTX_LH_FUSIFORM_SUVR : num 1.97 1.87 1.83 1.84 1.77 ...
## $ CTX_LH_FUSIFORM_VOLUME : int 7956 6997 6997 7694 7694 7694 7807 9083 9083 9083 ...
## $ CTX_LH_INFERIORPARIETAL_SUVR : num 1.99 1.95 1.94 1.85 1.89 ...
## $ CTX_LH_INFERIORPARIETAL_VOLUME : int 11656 10174 10174 9243 9243 9243 8180 9846 9846 9846 ...
## $ CTX_LH_INFERIORTEMPORAL_SUVR : num 2.16 1.97 2.05 2.1 2.02 ...
## $ CTX_LH_INFERIORTEMPORAL_VOLUME : int 6606 6418 6418 7286 7286 7286 6869 9599 9599 9599 ...
## $ CTX_LH_INSULA_SUVR : num 1.51 1.64 1.65 1.51 1.48 ...
## $ CTX_LH_INSULA_VOLUME : int 6711 4654 4654 6003 6003 6003 5513 6597 6597 6597 ...
## $ CTX_LH_ISTHMUSCINGULATE_SUVR : num 1.9 1.81 1.82 1.79 1.94 ...
## $ CTX_LH_ISTHMUSCINGULATE_VOLUME : int 2283 2215 2215 1549 1549 1549 1944 2264 2264 2264 ...
## $ CTX_LH_LATERALOCCIPITAL_SUVR : num 2.39 2.06 1.99 1.92 2 ...
## $ CTX_LH_LATERALOCCIPITAL_VOLUME : int 8532 10148 10148 8292 8292 8292 10612 9404 9404 9404 ...
## $ CTX_LH_LINGUAL_SUVR : num 2.27 1.95 1.97 1.76 1.74 ...
## $ CTX_LH_LINGUAL_VOLUME : int 4329 4658 4658 5606 5606 5606 5435 6488 6488 6488 ...
## $ CTX_LH_MIDDLETEMPORAL_SUVR : num 2.2 2.06 1.89 2.04 1.99 ...
## $ CTX_LH_MIDDLETEMPORAL_VOLUME : int 7445 8322 8322 7292 7292 7292 8031 9467 9467 9467 ...
## $ CTX_LH_PARACENTRAL_SUVR : num 1.99 1.79 1.8 1.91 1.8 ...
## $ CTX_LH_PARACENTRAL_VOLUME : int 2672 2890 2890 3231 3231 3231 3358 3173 3173 3173 ...
## $ CTX_LH_PARAHIPPOCAMPAL_SUVR : num 1.6 1.86 1.92 1.72 1.66 ...
## $ CTX_LH_PARAHIPPOCAMPAL_VOLUME : int 1659 1549 1549 1900 1900 1900 1989 2296 2296 2296 ...
## $ CTX_LH_PERICALCARINE_SUVR : num 2.23 1.45 1.41 1.56 1.54 ...
## $ CTX_LH_PERICALCARINE_VOLUME : int 1678 2004 2004 1866 1866 1866 1918 1927 1927 1927 ...
## $ CTX_LH_POSTCENTRAL_SUVR : num 2.03 1.81 1.82 1.85 1.78 ...
## $ CTX_LH_POSTCENTRAL_VOLUME : int 8281 8428 8428 8275 8275 8275 7580 8976 8976 8976 ...
## $ CTX_LH_POSTERIORCINGULATE_SUVR : num 1.82 1.89 1.84 1.72 1.67 ...
## $ CTX_LH_POSTERIORCINGULATE_VOLUME : int 2439 2608 2608 2683 2683 2683 2573 2638 2638 2638 ...
## $ CTX_LH_PRECENTRAL_SUVR : num 1.91 1.85 1.75 1.62 1.61 ...
## $ CTX_LH_PRECENTRAL_VOLUME : int 11174 12349 12349 10924 10924 10924 10820 12307 12307 12307 ...
## $ CTX_LH_PRECUNEUS_SUVR : num 1.93 1.89 1.94 1.81 1.81 ...
## $ CTX_LH_PRECUNEUS_VOLUME : int 7870 8313 8313 8387 8387 8387 8311 8584 8584 8584 ...
## $ CTX_LH_ROSTRALANTERIORCINGULATE_SUVR : num 1.71 1.58 1.49 1.59 1.48 ...
## $ CTX_LH_ROSTRALANTERIORCINGULATE_VOLUME: int 2928 2448 2448 1695 1695 1695 2466 2915 2915 2915 ...
## $ CTX_LH_SUPERIORFRONTAL_SUVR : num 1.86 1.86 1.74 1.84 1.77 ...
## [list output truncated]
The SUVR value is normalized to the tau-PET uptake in the inferior cerebellum gray matter (highlighted in blue below), a commonly-used region for tau normalization given the lack of inferior cerebellar tau pathology in Alzheimer’s Disease.
aseg_3d %>%
unnest(ggseg_3d) %>%
ungroup() %>%
select(region) %>%
na.omit() %>%
mutate(val=ifelse(region %in% c("Right-Cerebellum-Cortex", "Left-Cerebellum-Cortex"), 1, 0)) %>%
ggseg3d(atlas=aseg_3d, label="region", text="val", colour="val", na.alpha=0.5,
palette=c("transparent", "deepskyblue3"), show.legend=F) %>%
add_glassbrain() %>%
pan_camera("left lateral") %>%
remove_axes()